-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add support for forward translog reading #20163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds an index-level boolean to control translog read direction, threads it into Translog and MultiSnapshot to allow forward or backward snapshot iteration, changes translog generation handling before resync trimming, adjusts initial sync boundary computation, and adds tests exercising forward-read behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Index Config
participant Settings as IndexSettings
participant Translog as Translog
participant Multi as MultiSnapshot
participant Files as Translog Files
Config->>Settings: load index settings
Settings->>Settings: read INDEX_TRANSLOG_READ_FORWARD_SETTING
Translog->>Settings: isTranslogReadForward()
Translog->>Multi: newMultiSnapshot(snapshots, onClose, readForward)
alt readForward == true
note right of Multi: iterator yields 0..N-1 (forward)
else
note right of Multi: iterator yields N-1..0 (backward)
end
loop replay snapshots
Multi->>Files: read snapshot at iterator index
Files-->>Multi: operations (per-file order preserved)
Multi->>Multi: deduplicate by seqNo and yield operations
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
🔇 Additional comments (1)
Comment |
|
❌ Gradle check result for c968213: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
c968213 to
234b0c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
638-704: Consider extracting common test logic.The test duplicates 66 lines from
testSeqNoCollision()(lines 571-636), differing only in the setting on line 646. While explicit duplication in tests aids clarity, you might consider a parameterized test or helper method to reduce maintenance overhead if the test logic evolves.Example approach using a helper method:
public void testSeqNoCollision() throws Exception { testSeqNoCollisionWithSettings(Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1") .build()); } public void testSeqNoCollisionWithReadForward() throws Exception { testSeqNoCollisionWithSettings(Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true) .build()); } private void testSeqNoCollisionWithSettings(Settings settings) throws Exception { // Common test logic here }server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)
3876-3934: Forward-read snapshot test wiring looks correctThe test correctly:
- Constructs a translog with
INDEX_TRANSLOG_READ_FORWARD_SETTINGenabled viagetTranslogConfig(tempDir, settings).- Uses a separate
LocalTransloginstance so it doesn’t interfere with the class-leveltranslog.- Populates
viewsin generation order and asserts thatnewSnapshot()yields the concatenated operations in forward (oldest→newest) order.This aligns with the new forward-reading
MultiSnapshotsemantics and gives good coverage for multi-generation snapshots.As a minor optional clean-up, you could factor the common setup logic between this test and
testSnapshotReadOperationInReverseinto a small helper to reduce duplication, but it’s not strictly necessary here.server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)
533-572: Read-forward recovery test mirrors baseline behavior appropriatelyThis test cleanly mirrors
testRecoveryTrimsLocalTranslogwhile:
- Enabling
INDEX_TRANSLOG_READ_FORWARD_SETTINGon the index.- Creating the replication group with an
InternalEngineFactory, keeping the execution model aligned with the baseline test.- Reusing the same flow (in-flight docs, replica promotion/demotion, recovery, and consistency assertions), which is exactly what we need to validate trimming semantics under forward translog reading.
The coverage looks solid and should catch regressions specific to read-forward mode.
If you want to reduce maintenance overhead later, you could extract the common body of this test and
testRecoveryTrimsLocalTransloginto a shared helper that takes theSettings(or aboolean readForward) as a parameter, but this duplication is acceptable as-is.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(2 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/index/translog/Translog.java (1)
764-765: LGTM!The implementation correctly reads the new
isTranslogReadForward()setting from IndexSettings and passes it to the MultiSnapshot constructor, enabling bidirectional translog reading based on index configuration.server/src/main/java/org/opensearch/index/IndexSettings.java (1)
914-914: Translog read‑forward flag wiring looks correct; confirm non‑dynamic lifecycle is intentionalThe new
translogReadForwardflag is:
- Backed by
INDEX_TRANSLOG_READ_FORWARD_SETTINGwith defaultfalse.- Read once at construction time from
settings.- Stored in a
finalfield and exposed viaisTranslogReadForward().This means the read direction is effectively fixed for the lifetime of the
IndexSettingsinstance and cannot be changed via a dynamic settings update (the setting also lacksProperty.Dynamicand there is no update consumer).If the intent is “configure at index creation (or close/open) only”, this is perfectly fine and keeps behavior simple. If you expect operators to toggle read‑forward on an already‑open index, you’d need to:
- Mark the setting as
Property.Dynamic.- Store it in a
volatilefield instead offinal.- Register an update consumer on
scopedSettingssimilar to other translog settings.Given the sensitivity of recovery semantics, the non‑dynamic approach is probably safer, but it’s worth double‑checking that this matches your operational expectations.
Also applies to: 1135-1135, 2102-2107
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (2)
57-70: Constructor wiring forreadForwardflag and index initialization looks soundThe added
readForwardflag and constructor wiring are consistent:
readForwardisfinal, so direction is immutable per snapshot.index = readForward ? 0 : translogs.length - 1;correctly handles both directions and the empty‑array case (0vs-1with loops guarding against out‑of‑bounds).No correctness issues here.
84-111: Forward vs backward traversal shares correct dedupe semantics; relies on trim preconditionThe new
next()implementation cleanly bifurcates behavior:
- Forward path (
readForward == true): iteratesindexfrom0totranslogs.length - 1, consuming eachTranslogSnapshotin order and usingseenSeqNo.getAndSet+overriddenOperationsin the same way as before.- Backward path keeps the original behavior (from
translogs.length - 1down to0) with identical per‑operation logic.A few points worth noting:
- The reuse of the same inner loop and
SeqNoSetlogic in both branches preserves the existing semantics of “first occurrence wins with respect to the chosen traversal order” and keepsoverriddenOperationsaccounting correct.- With backward reading, “first occurrence” means “latest generation wins”, which avoids stale operations from older primary terms.
- With forward reading, “first occurrence” means “oldest generation wins”. This matches the risk described in the new index setting Javadoc: if trimming of stale operations (
trimOperationOfPreviousPrimaryTerms(...)) hasn’t happened yet, forward traversal can surface outdated ops from older primary terms.Given that:
- As long as forward reading is only enabled in flows where stale‑term trimming is guaranteed to have run before constructing this
MultiSnapshot, this implementation is consistent with the documented behavior.- If you want extra safety, you could consider adding assertions or tighter invariants at the call site (e.g., around when forward snapshots are created) to ensure we don’t accidentally use forward traversal in a pre‑trim state, but that’s optional and outside this class.
Overall, the bidirectional iteration logic here is correct and symmetric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/IndexSettings.java (3)
915-915: LGTM! Field declaration is correct.The field is appropriately declared as
finalsince the setting is not dynamic.
1136-1136: LGTM! Field initialization is correct.The initialization follows the standard pattern for non-dynamic IndexScope settings.
2103-2108: LGTM! Getter method is correctly implemented.The method follows standard naming conventions and provides appropriate access to the setting value.
|
❌ Gradle check result for fe0811d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for aaf78c5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for aaf78c5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for aaf78c5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
aaf78c5 to
f5c7303
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)
1136-1137: Consider usingscopedSettings.getfor consistencyHere you read the new setting directly from
settings:this.translogReadForward = INDEX_TRANSLOG_READ_FORWARD_SETTING.get(settings);For consistency with nearby index-level settings, consider:
this.translogReadForward = scopedSettings.get(INDEX_TRANSLOG_READ_FORWARD_SETTING);This keeps the access pattern uniform and future-proofs things if the setting ever becomes dynamic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(4 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- server/src/main/java/org/opensearch/index/translog/Translog.java
- CHANGELOG.md
- server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java
- server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
- server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java
- server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: Analyze (java)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: detect-breaking-change
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/IndexSettings.java (3)
915-915: Final field for read-forward flag is appropriateUsing a
final boolean translogReadForwardhere matches the non-dynamic nature of the setting and is concurrency-safe for read-only access.
2103-2108: Accessor for translog read-forward flag is straightforward and clearThe getter and its Javadoc are aligned with the setting semantics and existing
isXyzEnabled()patterns on this class.
199-218: New translog read-forward setting and Javadoc look good; verify registration in IndexScopedSettingsThe setting definition and Javadoc are clear and accurately document the rare stale-operation edge case. Since this is declared with only
Property.IndexScope(non-dynamic), it's fixed for the lifetime ofIndexSettings, which is appropriate for this use case.Verify that
INDEX_TRANSLOG_READ_FORWARD_SETTINGis registered in the appropriateIndexScopedSettingsbuilt-in set (likelyBUILT_IN_INDEX_SETTINGSor similar) so that index creation and updates recognize this setting and do not report it as unknown.
|
❌ Gradle check result for f5c7303: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
@xuxiong1 -- Looks like the latest test failure is related to this change |
|
❌ Gradle check result for 3c85d10: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Was able to reproduce with the seed. I think the issue is from the primary replica re-sync process, the new primary would re-sync its operations (from global checkpoint to max seqNo) to the old primary (now replica), which would cause the duplicate operations (same seqNo, different term) in the old primary's translog. The re-sync process also triggers The fix is to trim on replica during re-sync, starting from the global checkpoint instead of max seqNo, like how it's done in the finalizeRecovery |
|
❌ Gradle check result for 5c8b37d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
5c8b37d to
7ca112a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
CHANGELOG.md (1)
41-42: Deduplicate and fix the forward‑translog changelog entryThere are two identical bullets for
forward translog reading, and the first one is still missing the closing). Keep a single, correctly formatted entry:- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163) - - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163)) + - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (1)
43-76: Forward traversal +seenSeqNonow prefers older ops if duplicates slip throughThe iterator setup and
index/-1sentinel look correct for both directions, and backward traversal preserves the prior behavior. One subtle change in semantics:
- With
readForward == false, we still visit newer generations first, soseenSeqNoensures the newest op for a given seqNo “wins”.- With
readForward == true, we now visit older generations first; if any duplicate seqNo operations remain (e.g., from incomplete trimming),seenSeqNowill keep the first/oldest op and treat later/newer ones as overridden.The design here assumes that, under
INDEX_TRANSLOG_READ_FORWARD_SETTING == true,trimOperationOfPreviousPrimaryTerms(...)guarantees that duplicates for a given seqNo have already been removed, so this reversal never becomes observable. If that invariant ever regresses, forward replay would silently favor stale operations.Consider either:
- Asserting in forward mode that no duplicate seqNos are seen, or
- Adjusting the dedup strategy when
readForward == true(or disabling it entirely there) so that newer operations cannot be shadowed if duplicates do appear.Also applies to: 88-101
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
638-704: Forward‑read seqNo‑collision test mirrors backward case and looks correct
testSeqNoCollisionWithReadForwardfaithfully mirrorstestSeqNoCollisionwhile enablingINDEX_TRANSLOG_READ_FORWARD_SETTING, validating that the resync + trim logic still yields a single higher‑term op and that peer recovery only transfers non‑overridden operations with zero skipped ops. This is the right regression test for the failure described in the PR.If you find more of these forward/backward pairs accumulating, it might be worth extracting a small helper parameterized by
readForwardto avoid test drift, but that’s optional here.server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)
3876-3934: Forward snapshot test correctly exercises read‑forward orderingThis forward‑only test mirrors
testSnapshotReadOperationInReverseand validates that, withINDEX_TRANSLOG_READ_FORWARD_SETTINGenabled,newSnapshot()yields operations in per‑generation insertion order (gen0, gen1, …), which is exactly what MultiSnapshot’s forward iterator is supposed to do.One small clarity tweak you could consider: instead of wiring
() -> globalCheckpoint.get()from the class field, pass a fixed supplier (e.g.() -> SequenceNumbers.NO_OPS_PERFORMED) to theLocalTranslogconstructed here so the test is fully independent of any prior changes toglobalCheckpoint. Behavior is unchanged today but the intent would be more explicit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java(1 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(4 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
- server/src/main/java/org/opensearch/index/translog/Translog.java
- server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
libs/core/src/main/java/org/opensearch/core/xcontent/MediaTypeRegistry.java (1)
MediaTypeRegistry(57-418)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, ubuntu-latest)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/index/IndexSettings.java (4)
199-218: LGTM! Setting declaration is well-designed.The setting is properly declared with comprehensive documentation. The non-dynamic nature (absence of
Property.Dynamic) is appropriate, as changing translog read direction mid-operation could lead to inconsistencies. The Javadoc clearly explains the edge cases around stale operations during recovery, which aligns with the concerns raised in the PR discussion.
915-915: LGTM! Field declaration follows conventions.The field is appropriately declared as
private final, which is consistent with other non-dynamic settings in this class and ensures the value cannot be modified after initialization.
1136-1136: LGTM! Field initialization is correct.The field is properly initialized in the constructor using the standard
Setting.get(settings)pattern, consistent with other non-dynamic settings in this class.
2103-2108: LGTM! Accessor method is correctly implemented.The method follows the class conventions with appropriate naming (
isTranslogReadForward) and provides clear Javadoc. The placement near similar accessor methods likeisSoftDeleteEnabled()maintains good code organization.server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1)
202-205: Rolling translog generation before trimming aligns with trim invariantsCalling
replica.rollTranslogGeneration()beforetrimOperationOfPreviousPrimaryTerms(...)is consistent with the translog trim invariants (e.g., avoiding trimming from the current writer generation) and matches how trimming is exercised in tests with forward reading. Looks good.server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java (1)
126-142: ExpectingtrimAboveSeqNo== global checkpoint matches new resync semanticsUpdating the assertion to
equalTo(globalCheckPoint)correctly reflects the new behavior where resync trimming starts from the global checkpoint rather thanmaxSeqNo. This keeps the test consistent with the updatedPrimaryReplicaSyncerlogic.
|
❌ Gradle check result for 7ca112a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
7ca112a to
79c1e0a
Compare
Signed-off-by: xuxiong1 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
CHANGELOG.md (1)
41-42: Remove duplicated/malformed changelog entry for forward translog readingYou currently have two entries for the same feature, and the first is still missing the closing parenthesis. Keep a single, well-formed line.
- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163) - - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163)) + - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
🧹 Nitpick comments (1)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
638-704: Consider refactoring to reduce duplication.The
testSeqNoCollisionWithReadForward()test is nearly identical totestSeqNoCollision()(lines 571-636), differing only in enabling theINDEX_TRANSLOG_READ_FORWARD_SETTING. While duplicating tests to cover feature flag variations is a common and acceptable pattern, you could reduce maintenance burden by extracting the shared logic into a parameterized test or helper method.Example refactor:
@ParameterizedTest @ValueSource(booleans = {false, true}) public void testSeqNoCollision(boolean readForward) throws Exception { Settings.Builder settingsBuilder = Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1"); if (readForward) { settingsBuilder.put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true); } try (ReplicationGroup shards = createGroup(2, settingsBuilder.build())) { // ... shared test logic ... } }That said, the current approach is perfectly acceptable and may be preferred if you want each scenario to be independently debuggable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java(1 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(4 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- server/src/main/java/org/opensearch/index/translog/Translog.java
- server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1)
184-205: Rolling translog generation before trimming looks correct and matches trim invariantsCalling
replica.rollTranslogGeneration()beforetrimOperationOfPreviousPrimaryTermsensures no mixed-term operations remain in the current writer when trimming runs, which matches theLocalTranslogtrim invariants and avoids stale previous-primary-term ops when forward-reading the translog. The sequencing here (apply ops → roll → trim) looks sound.server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (1)
43-45: Iterator-based bidirectional traversal correctly preserves snapshot and dedup semanticsThe new
PrimitiveIterator+indexstate cleanly iterates snapshots in either 0..n−1 or n−1..0 order, fully exhausting eachTranslogSnapshotbefore advancing, while reusing the existingseenSeqNodedup logic. This fixes the earlier “one op per snapshot” issue and looks correct for both forward and backward reads.Also applies to: 57-76, 88-101
server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java (1)
319-359: UsingstartingSeqNo - 1astrimAboveSeqNocorrectly aligns trimming with the global checkpointDeriving
trimAboveSeqNofromstartingSeqNo - 1(i.e., the last known global checkpoint) and always sending the first request even when there are no ops ensures replicas trim previous-primary-term operations for all seqNos above the global checkpoint, fixing the earlier “trim from max seqNo only” gap without affecting later batches.server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)
533-572: Forward-read recovery trimming test mirrors the existing reverse-read coverage
testRecoveryTrimsLocalTranslogWithReadForwardis a straightforward forward-read clone oftestRecoveryTrimsLocalTranslog, correctly wiringINDEX_TRANSLOG_READ_FORWARD_SETTINGand asserting Lucene/translog consistency and history equality after promotion/demotion cycles. Looks good.server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)
3876-3934: Forward snapshot test correctly validates generation-order iteration
testSnapshotReadOperationForwardcreates a dedicated translog withINDEX_TRANSLOG_READ_FORWARD_SETTINGenabled, writes multiple generations while tracking per-generation ops, and then asserts thatnewSnapshot()yields the concatenated gen0→genN sequence. This is the right assertion for the forward-read path and cleanly exercises the new iteration logic.server/src/main/java/org/opensearch/index/IndexSettings.java (1)
199-218: Excellent documentation for the new setting.The comprehensive Javadoc clearly explains the behavior, default semantics, and edge case risks associated with forward reading. The choice to make this a non-dynamic index-scoped setting (cannot be changed after index creation) is appropriate given the complexity of the feature and the edge cases mentioned.
|
❕ Gradle check result for 78bc921: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Description
This commit introduces a feature to read translog operations in forward order
(oldest to newest) instead of the default backward order (newest to oldest).
Changes:
index.translog.read_forwardsetting (default: false) inIndexSettingsTests:
Related Issues
Resolves #20094
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.